Quantifying and Resolving Remote Memory Access Contention on Hardware DSM Multiprocessors
نویسنده
چکیده
This paper makes the following contributions: It proposes a new methodology for quantifying remote memory access contention on hardware DSM multiprocessors. The most valuable aspect of this methodology is that it assesses the impact of contention on real parallel programs running on real hardware. The methodology uses as input the number of accesses from each DSM node to each page in memory. A trace of the memory accesses of the program obtained at runtime from hardware counters is used to compute an accurate estimate of the fraction of execution time wasted due to contention. The paper presents also a new algorithm which detects potential hot spots in pages and resolves contention on them using dynamic page migration. The algorithm balances the remote memory accesses across the nodes of the system, while trying to improve memory access locality. Experiments with five parallel codes with irregular memory access patterns on a 128processor Origin2000 show that our algorithm yields respectable reductions of execution time, averaging 27.7%.
منابع مشابه
Quantifying contention and balancing memory load on hardware DSM multiprocessors
This paper makes the following contributions: It proposes a new methodology for quantifying remote memory access contention on hardware DSM multiprocessors. The most valuable aspect of this methodology is that it assesses the overhead of contention on real parallel programs running on real hardware. The methodology uses as input the number of accesses from each node of the DSM to each page in m...
متن کاملComputation and Data Partitioning on Scalable Shared Memory Multiprocessors
In this paper we identify the factors that affect the derivation of computation and data partitions on scalable shared memory multiprocessors (SSMMs). We show that these factors necessitate an SSMM-conscious approach. In addition to remote memory access, which is the sole factor on distributed memory multiprocessors, cache affinity, memory contention and false sharing are important factors that...
متن کاملMeshes vs. Hypercubes: A case study for Distributed Shared-memory Multiprocessors
Distributed shared-memory multiprocessors (DSM) are gaining acceptance because they are easier to program than multicomputers. Recently proposed DSM use a direct interconnection network to access remote memory locations, making these architectures scalable. Most DSMs implement a cache coherence protocol by hardware. This protocol exchanges data and control messages through the interconnection n...
متن کاملEager Combining: a Coherency Protocol for Increasing Eeective Network and Memory Bandwidth in Shared-memory Multiprocessors
One common cause of poor performance in large-scale shared-memory multiprocessors is limited memory or interconnection network bandwidth. Even well-designed machines can exhibit band-width limitations when a program issues an excessive number of remote memory accesses or when remote accesses are distributed non-uniformly. While techniques for improving locality of reference are often successful...
متن کاملClassifying and alleviating the communication overheads in matrix computations on large-scale NUMA multiprocessors
Large-scale, shared-memory multiprocessors have non-uniform memory access (NUMA) costs. The high communication cost dominates the source of matrix computations' execution. Memory contention and remote memory access are two major communication overheads on large-scale NUMA multiprocessors. However, previous experiments and discussions focus either on reducing the number of remote memory accesses...
متن کامل